Recycling contamination occurs when waste is incorrectly disposed of - like recycling a pizza box with oil on it (compost). Or when waste is correctly disposed of but incorrectly prepared - like recycling unrinsed jam jars.
Contamination is a huge problem in the recycling industry that can be mitigated with automated waste sorting. Just for kicks, I thought I'd try my hand at prototyping an image classifier to classify trash and recyclables - this classifier could have applications in an optical sorting system.
In this project, I'll train a convolutional neural network to classify an image as either cardboard, glass, metal, paper, plastic, or trash with the fastai library (built on PyTorch). I used an image dataset collected manually by Gary Thung and Mindy Yang. Download their dataset here to follow along, then move it to the same directory as this notebook. (Note: you'll want to use a GPU to speed up training.)
%reload_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
from fastai.vision import *
from fastai.metrics import error_rate
from pathlib import Path
from glob2 import glob
from sklearn.metrics import confusion_matrix
import pandas as pd
import numpy as np
import os
import zipfile as zf
import shutil
import re
import seaborn as sns
First, we need to extract the contents of "dataset-resized.zip".
files = zf.ZipFile("dataset-resized.zip",'r')
files.extractall()
files.close()
Once unzipped, the dataset-resized folder has six subfolders:
os.listdir(os.path.join(os.getcwd(),"dataset-resized"))
Now that we've extracted the data, I'm going to split images up into train, validation, and test image folders with a 50-25-25 split. First, I'll define some functions that will help me quickly build it. If you're not interested in building the data set, you can just run this ignore it.
## helper functions ##
## splits indices for a folder into train, validation, and test indices with random sampling
## input: folder path
## output: train, valid, and test indices
def split_indices(folder,seed1,seed2):
n = len(os.listdir(folder))
full_set = list(range(1,n+1))
## train indices
random.seed(seed1)
train = random.sample(list(range(1,n+1)),int(.5*n))
## temp
remain = list(set(full_set)-set(train))
## separate remaining into validation and test
random.seed(seed2)
valid = random.sample(remain,int(.5*len(remain)))
test = list(set(remain)-set(valid))
return(train,valid,test)
## gets file names for a particular type of trash, given indices
## input: waste category and indices
## output: file names
def get_names(waste_type,indices):
file_names = [waste_type+str(i)+".jpg" for i in indices]
return(file_names)
## moves group of source files to another folder
## input: list of source files and destination folder
## no output
def move_files(source_files,destination_folder):
for file in source_files:
shutil.move(file,destination_folder)
Next, I'm going to create a bunch of destination folders according to the ImageNet directory convention. It'll look like this:
/data
/train
/cardboard
/glass
/metal
/paper
/plastic
/trash
/valid
/cardboard
/glass
/metal
/paper
/plastic
/trash
/test
Each image file is just the material name and a number (i.e. cardboard1.jpg)
Again, this is just housekeeping to organize my files.
## paths will be train/cardboard, train/glass, etc...
subsets = ['train','valid']
waste_types = ['cardboard','glass','metal','paper','plastic','trash']
## create destination folders for data subset and waste type
for subset in subsets:
for waste_type in waste_types:
folder = os.path.join('data',subset,waste_type)
if not os.path.exists(folder):
os.makedirs(folder)
if not os.path.exists(os.path.join('data','test')):
os.makedirs(os.path.join('data','test'))
## move files to destination folders for each waste type
for waste_type in waste_types:
source_folder = os.path.join('dataset-resized',waste_type)
train_ind, valid_ind, test_ind = split_indices(source_folder,1,1)
## move source files to train
train_names = get_names(waste_type,train_ind)
train_source_files = [os.path.join(source_folder,name) for name in train_names]
train_dest = "data/train/"+waste_type
move_files(train_source_files,train_dest)
## move source files to valid
valid_names = get_names(waste_type,valid_ind)
valid_source_files = [os.path.join(source_folder,name) for name in valid_names]
valid_dest = "data/valid/"+waste_type
move_files(valid_source_files,valid_dest)
## move source files to test
test_names = get_names(waste_type,test_ind)
test_source_files = [os.path.join(source_folder,name) for name in test_names]
## I use data/test here because the images can be mixed up
move_files(test_source_files,"data/test")
I set the seed for both random samples to be 1 for reproducibility. Now that the data's organized, we can get to model training.
## get a path to the folder with images
path = Path(os.getcwd())/"data"
path
tfms = get_transforms(do_flip=True,flip_vert=True)
data = ImageDataBunch.from_folder(path,test="test",ds_tfms=tfms,bs=16)
The batch size bs is how many images you'll train at a time. Choose a smaller batch size if your computer has less memory.
You can use get_transforms() function to augment your data. I'll compare the results from flipping images horizontally and vertically.
data
print(data.classes)
Here's an example of what the data looks like:
data.show_batch(rows=4,figsize=(10,8))
learn = create_cnn(data,models.resnet34,metrics=error_rate)
A residual neural network is a convolutional neural network (CNN) with lots of layers. In particular, resnet34 is a CNN with 34 layers that's been pretrained on the ImageNet database. A pretrained CNN will perform better on new image classification tasks because it has already learned some visual features and can transfer that knowledge over (hence transfer learning).
Since they're capable of describing more complexity, deep neural networks should theoretically perform better than shallow networks on training data. In reality, though, deep neural networks tend to perform empirically worse than shallow ones.
Resnets were created to circumvent this glitch using a hack called shortcut connections. If some nodes in a layer have suboptimal values, you can adjust weights and bias; if a node is optimal (its residual is 0), why not leave it alone? Adjustments are only made to nodes on an as-needed basis (when there's non-zero residuals).
When adjustments are needed, shortcut connections apply the identity function to pass information to subsequent layers. This shortens the neural network when possible and allows resnets to have deep architectures and behave more like shallow neural networks. The 34 in resnet34 just refers to the number of layers.
Anand Saha gives a great more in-depth explanation here.
learn.model
learn.lr_find(start_lr=1e-6,end_lr=1e1)
learn.recorder.plot()
The learning rate finder suggests a learning rate of 5.13e-03. With this, we can train the model.
learn.fit_one_cycle(20,max_lr=5.13e-03)
I ran my model for 20 epochs. What's cool about this fitting method is that the learning rate decreases with each epoch, allowing us to get closer and closer to the optimum. At 8.6%, the validation error looks super good... let's see how it performs on the test data though.
First, we can take a look at which images were most incorrectly classified.
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
interp.plot_top_losses(9, figsize=(15,11))
The images here that the recycler performed poorly on were actually degraded. It looks the photos received too much exposure or something so this actually isn't a fault with the model!
doc(interp.plot_top_losses)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
This model often confused plastic for glass and confused metal for glass. The list of most confused images is below.
interp.most_confused(min_val=2)
To see how this mode really performs, we need to make predictions on test data. First, I'll make predictions on the test data using the learner.get_preds() method.
Note: learner.predict() only predicts on a single image, while learner.get_preds() predicts on a set of images. I highly recommend reading the documentation to learn more about predict() and get_preds().
preds = learn.get_preds(ds_type=DatasetType.Test)
The ds_type argument in get_preds(ds_type) takes a DataSet argument. Example values are DataSet.Train, DataSet.Valid, and DataSet.Test. I mention this because I made the mistake of passing in actual data (learn.data.test_ds) which gave me the wrong output and took embarrassingly long to debug.
Don't make this mistake! Don't pass in data -- pass in the dataset type!
print(preds[0].shape)
preds[0]
These are the predicted probablities for each image. This tensor has 365 rows -- one for each image -- and 6 columns -- one for each material category.
data.classes
Now I'm going to convert the probabilities in the tensor above to a string with one of the class names.
## saves the index (0 to 5) of most likely (max) predicted class for each image
max_idxs = np.asarray(np.argmax(preds[0],axis=1))
yhat = []
for max_idx in max_idxs:
yhat.append(data.classes[max_idx])
yhat
These are the predicted labels of all the images! Let's check if the first image is actually glass.
learn.data.test_ds[0][0]
It is!
Next, I'll get the actual labels from the test dataset.
y = []
## convert POSIX paths to string first
for label_path in data.test_ds.items:
y.append(str(label_path))
## then extract waste type from file path
pattern = re.compile("([a-z]+)[0-9]+")
for i in range(len(y)):
y[i] = pattern.search(y[i]).group(1)
A quick check.
## predicted values
print(yhat[0:5])
## actual values
print(y[0:5])
learn.data.test_ds[0][0]
It looks the first five predictions match up! (check)
How did we end up doing? Again we can use a confusion matrix to find out.
cm = confusion_matrix(y,yhat)
print(cm)
Let's try and make this matrix a little prettier.
df_cm = pd.DataFrame(cm,waste_types,waste_types)
plt.figure(figsize=(10,8))
sns.heatmap(df_cm,annot=True,fmt="d",cmap="YlGnBu")
Again, the model seems to have confused metal for glass and plastic for glass. With more time, I'm sure further investigation could help reduce these mistakes.
correct = 0
for r in range(len(cm)):
for c in range(len(cm)):
if (r==c):
correct += cm[r,c]
accuracy = correct/sum(sum(cm))
accuracy
I ended up achieving an accuracy of 92.1% on the test data which is pretty great -- the original creators of the TrashNet dataset achieved a test accuracy of 63% with a support vector machine on a 70-30 test-train split (they trained a neural network as well for a test accuracy of 27%).
## delete everything when you're done to save space
shutil.rmtree("data")
shutil.rmtree('dataset-resized')
If I had more time, I'd go back and reduce classification error for glass in particular. I'd also delete photos from the dataset that are overexposed, since those images are just bad data.
This was just a quick and dirty mini-project to show that it's pretty quick to train an image classification model, but it pretty amazing how quickly you can create a state-of-the-art model by using the fastai library. If you have an application you're interested in but don't think you have the machine learning chops, this should be encouraging for you.